Image recognition using hidden markov models and coupled hidden markov models

ABSTRACT

An image processing system useful for facial recognition and security identification obtains an array of observation vectors from a facial image to be identified. A Viterbi algorithm is applied to the observation vectors given the parameters of a hierarchical statistical model for each object, and a face is identified by finding a highest matching score between an observation sequence and the hierarchical statistical model.

FIELD OF THE INVENTION

[0001] The present invention relates to image recognition. Moreparticularly, the present invention relates to improved Bayesiannetworks for image classification.

BACKGROUND

[0002] Identifying a specific object from an image is a patternrecognition task performed at least in a two-dimensional feature space(multispectral techniques can add additional dimensions). This caninclude character recognition, object detection, or image analysis.Image identification and pattern recognition tasks are particularlynecessary for identification and security applications, includingidentification and analysis of facial features and visual tracking ofindividuals.

[0003] Facial analysis can include facial feature extraction,representation, and expression recognition. Available facial analysissystems are currently capable of discriminating among different facialexpressions, including lip and mouth position. Unfortunately, many suchavailable systems require substantial manual input for best results,especially when low quality video systems are the primary data source.Previous approaches for face recognition have been based on geometricmeasurements (which can require substantial normalization efforts),template based methods (which have substantial updating problems), andmodelling methods (which have accuracy issues).

BRIEF DESCRIPTION OF THE DRAWINGS

[0004] The inventions will be understood more fully from the detaileddescription given below and from the accompanying drawings ofembodiments of the inventions which, however, should not be taken tolimit the inventions to the specific embodiments described, but are forexplanation and understanding only

[0005]FIG. 1 schematically illustrates an image classification system;

[0006]FIG. 2 generically illustrates an embedded hidden Markovmodel-coupled hidden Markov model (HMM-CHMM) structure;

[0007]FIG. 3 is a flow diagram illustrating training of an embeddedHMM-CHMM;

[0008]FIG. 4 generically illustrates an embedded coupled hidden Markovmodel- hidden Markov model (CHMM-HMM) structure; and

[0009]FIG. 5 is a cartoon illustrating block extraction for a facialimage.

DETAILED DESCRIPTION

[0010]FIG. 1 generally illustrates a system 10 for data analysis of adata set 12 using an embedded Bayesian network that includes a hiddenMarkov model (HMM) and a coupled hidden Markov model (CHMM). A embeddedBayesian network is used because it has good generalization performanceeven for high dimensional input data and small training sets.

[0011] The data set 12 can include static or video imagery 14 containingobjects to be identified or classified, including but not limited totextual characters, ideographs, symbols, fingerprints, or even facialimagery 15. The same data set can be optionally used both to train andclassify data with the appropriate training module 20 and classificationmodule 22.

[0012] The processing procedure for system 10 may be performed by aproperly programmed general-purpose computer alone or in connection witha special purpose computer. Such processing may be performed by a singleplatform or by a distributed processing platform. In addition, suchprocessing and functionality can be implemented in the form of specialpurpose hardware, custom application specific integrated circuits(ASICs), configurable FPGA circuits, or in the form of software orfirmware being run by a general-purpose or network processor. Datahandled in such processing or created as a result of such processing canbe stored in any memory as is conventional in the art. By way ofexample, such data may be stored in a temporary memory, such as in theRAM of a given computer system or subsystem In addition, or in thealternative, such data may be stored in longer-term storage devices, forexample, magnetic disks, rewritable optical disks, and so on. Forpurposes of the disclosure herein, a computer-readable media maycomprise any form of data storage mechanism, including such existingmemory technologies as well as hardware or circuit representations ofsuch structures and of such data.

[0013]FIG. 2 generically illustrates a logical structure 30 of anembedded hidden Markov model-coupled hidden Markov model (HMM-CHMM). Asseen in FIG. 2, HMM-CHMM is a hierarchical statistical model thatincludes a HMM parent layer 32 (collectively formed from nodes 33) and aCHMM child layer 34 (collectively formed from nodes 35). The child layer34 associates one CHMM node 35 to each node 33 in the parent layer 32,and the parameters of the individual C remain independent from eachother. Instead, the parameters of each child layer CHMM depend upon thestate of the connected parent node 33. Typically, for multidimensionaldata sets, the HMM in the parent layer 32 is associated with at leastone dimension, and the G child layers are associated with data in anorthogonal dimension with respect to the parent layer.

[0014] Formally defined, the elements of an embedded HMM-CHMM have:aninitial super state probability π_(0,0) and a super state transitionprobability from super state j to super state i, a_(0,t|j), where superstate refers to the state of the parent layer 32 HMM node 33. For eachsuper state k the parameters of the corresponding CHMM are defined tohave an initial state probability in a channel ofc = 1, …  , C₁, π_(1, 0)^(k, c);

[0015] a state transition probability from state sequence j to state:i_(c), a_(1, i_(c)j)^(k, c);

[0016] and an observation probability: b_(t₀, t₁)^(k, c)(j_(c)).

[0017] In a continuous mixture with Gaussian components, the probabilityof the observation vector O is given by:${b^{k,c}\left( j_{c} \right)} = {\sum\limits_{m = 1}^{M_{j}^{k,c}}{\omega_{j,m}^{k,c}{N\left( {O,\mu_{j,m}^{k,c},U_{j,m}^{k,c}} \right)}}}$

[0018] where μ_(j, m)^(k, c)

[0019] and U_(j, m)^(k, c)

[0020] are the mean and covariance matrix of the mth mixture of theGaussian mixture corresponding to the jth state in the cth channel,M_(j)^(k, c)

[0021] is the number of mixtures corresponding to the jth state of thecth channel, and ω_(j, m)^(k, c)

[0022] is a weight associated with the corresponding mixture.

[0023] Observation sequences are used to form observation vectors laterused in training and classifying. For example, the observation sequencefor a two-dimensional image may be formed from image blocks of sizeL_(x)×T_(y) that are extracted by scanning the image from left-to-rightand top-to-bottom. Adjacent image blocks may be designed to have anoverlap by P_(y) rows in the vertical direction and P_(x) columns in thehorizontal direction. In one possible embodiment, with blocks size ofL_(y)=8 rows and L_(x)=8 columns, a six DCT coefficients (a 3×2low-frequency array) may be employed to create the overlap.

[0024] The resulting array of observation vectors may correspond to sizeof T₀×T₁, where T₀ and T₁ are the number of observation vectorsextracted along the height (H) and the width (W) of the image,respectively. T₀ and T₁ may be computed accordingly as:${T_{0} = {\frac{H - L_{y}}{L_{y} - P_{y}} + 1}},{T_{1} = {\frac{W - L_{x}}{L_{x} - P_{x}} + 1}}$

[0025] Consecutive horizontal and vertical observation vectors may alsobe grouped together to form observation blocks. This may be used as away to consolidate local observations and at the same time to reduce thetotal amount of observations. In practice, this data grouping servesapplication needs and improve recognition efficiency.

[0026] To compute the number of observation blocks, denote the number ofobservation blocks in the vertical and horizontal direction be T₀ ⁰ andT₁ ⁰, respectively. Then,

T ₀ ⁰=1 $T_{1}^{0} = \frac{T_{1}}{C_{1}}$

[0027] In addition, denote the number of observation vectors in thehorizontal and vertical direction within each observation block by T₀ ¹and T₁ ¹, respectively, where

T ₀ ¹ =T ₁

T ₁ ¹ =C ₁

[0028] Furthermore, denote 0 _(t0,t1,c) as the t₁ th observation vectorcorresponding to the cth channel within the observation block t₀.Although any suitable state sequence segmentation can be used, amodified Viterbi algorithm for the HMM-CHMM is preferred. Application ofthis modified Viterbi algorithm determines the optimal state and superstate segmentation of the observation sequence. The best super stateprobability for the observation block t₀ given super state i of superchannel s, is denoted as P_(t0)(i). Corresponding optimal state andoptimal state sequence β_(t0,t1,c)(i) may then be computed for eachsuper observation. The following states are first initialized:

δ(i)=λ_(0,0)(i)P_(t0)(i)

ψ₀(i)=0

[0029] The following states are then recursively determined:

δ_(t0)(i)=max_(j){δ_(t0−1)(j)a _(0,i|j) P _(t0)(i)}

ψ_(t0)(i)=arg max_(j){δ_(t0−1)(j)a _(0,i|j) P _(t0)(i)}

[0030] The termination condition is then computed:

P=max_(i){δ_(T0)(i)}

α_(T0)=arg max_(i){δ_(T0)(i)}

[0031] Based on the computed termination condition, a backtrackingoperation is performed:

α_(T0)=ψ_(0,t+1)(α_(T0+1))

[0032] $\begin{matrix}{q_{t_{0},t_{1},c}^{0} = \alpha_{t_{0}}} \\{q_{t_{0},t_{1},c}^{1} = {\beta_{t_{0},t_{1},c}\left( \alpha_{t_{0}} \right)}}\end{matrix}$

[0033]FIG. 3 is a flow diagram 40 illustrating training of an embeddedHMM-CHMM based on the Viterbi algorithm, according to embodiments of thepresent invention. To train an HMM-CHMM based on given training data,observation vectors are first extracted from the training data set andorganized in observation blocks (module 42). These observation blocksare uniformly segmented (module 44), replaced by an optimal statesegmentation algorithm (module 46), have their model parametersestimated (module 48), and observation likelihood determined (module50). As will be appreciated, the training may be iterative, with eachtraining data set used individually and iteratively to update modelparameters until the observation likelihood computed is smaller than aspecified threshold.

[0034] More specifically, the training data set may be segmented along afirst dimension according to the number of super states, into aplurality of uniform segments each of which corresponding to a superstate. Based on the uniform segmentation at the super layer, theobservation vectors within each uniform segment may then be uniformlysegmented according to the number of channels and number of states ofeach child CHMM.

[0035] The density function of each state (including both super statesas well as child states) may be initialized before the training takesplace. For example, if Gaussian mixture model is adopted for each state,Gaussian parameters for each of the mixture component may need to beinitialized. Different approaches may be employed to achieve theinitialization of model parameters. For example, one embodiment may beimplemented where the observation sequence assigned to each channel cand state j, and super state k and super channel s may be assigned toM_(j)^(k, c)

[0036] clusters using, for example, the K-means algorithm.

[0037] During the process of training, the original uniform segmentationis updated based on the optimal state segmentation using the Viterbialgorithm or other suitable algorithms. To update the density functionof a state, particular relevant parameters to be updated may bedetermined prior to the update operation.

[0038] The selection of a Gaussian mixture component for each state jchannel c and super state k is also required. One exemplary criterion tomake the selection may correspond to assigning the observationO_(t₀, t₁, c)^((r))

[0039] from the rth training sample in the training set to the Gaussiancomponent for which the Gaussian density functionN(O_(t₀, t₁, c)^((r)); μ_(j, m)^(k, c), U_(j, m)^(k, c))

[0040] is the highest.

[0041] The parameters are then estimated using, for example, anextension of the segmental K- means algorithm. In particular, theestimated transition probability a′_(0,i) _(c) _(|j) between superstates i_(s) and j may be obtained as follows:$a_{0,{ij}}^{\prime} = \frac{{\sum\limits_{r}{\sum\limits_{t_{0}}\sum\limits_{t_{1}}}} \in_{t_{0}}^{(r)}\left( {i,j} \right)}{{\sum\limits_{r}{\sum\limits_{t_{0}}{\sum\limits_{t_{1}}\sum\limits_{l}}}} \in_{t_{0}}^{(r)}\left( {i,l} \right)}$

[0042] where ∈_(t₀)^((r))(i, l)

[0043] may equal to one if a transition from super state 1 to the superstate i occurs for the observation block (t₀) and zero otherwise. Theestimated transition probabilities a_(1, i_(c)j)^(′  k, c)

[0044] from embedded state sequence j to the embedded state i_(c) inchannel c of super state k may then be obtained as follows,$a_{1,{i_{c}j}}^{{\prime \quad k},c} = \frac{\sum\limits_{r}{\sum\limits_{t_{0}}{\sum\limits_{t_{1}}{{\theta \quad}_{t_{0},t_{1}}^{(r)}\left( {k,c,i_{c},j} \right)}}}}{\sum\limits_{r}{\sum\limits_{t_{0}}{\sum\limits_{t_{1}}{\sum\limits_{I}{{\theta \quad}_{t_{0},t_{1}}^{(r)}\left( {{k.c},i_{c},I} \right)}}}}}$

[0045] where θ  _(t₀, t₁)^((r))(s, k.c, i_(c), I)

[0046] may be one if in the observation block (t₀) from the rth trainingsample a transition from state sequence i to state i_(c) in channel coccurs for the observation O_(t₀, t₁, c)^((r))

[0047] and zero otherwise.

[0048] The parameters of the selected Gaussian mixture component mayalso be accordingly updated. The involved Gaussian parameters mayinclude a mean vector μ_(j, m)^(′  k, c),

[0049] covariance matrix U_(j, m)^(′  k, c)

[0050] of the Gaussian mixture, and the mixture coefficientsω_(j, m)^(′  k, c)

[0051] for mixture m of state j channel c and super state k The updatedGaussian parameters may be obtained according to the followingformulations: $\begin{matrix}{\mu_{j,m}^{{\prime \quad k},c} = \frac{\sum\limits_{r}^{\quad}{\sum\limits_{t_{0}}^{\quad}{\sum\limits_{t_{1}}^{\quad}{{\psi_{t_{0},t_{1}}^{(r)}\left( {k,c,j,m} \right)}O_{{t_{0}t_{1}},c}^{(r)}}}}}{\sum\limits_{r}^{\quad}{\sum\limits_{t_{0}}^{\quad}{\sum\limits_{t_{1}}^{\quad}{\psi_{t_{0},t_{1}}^{(r)}\left( {k,c,j,m} \right)}}}}} \\{U_{j,m}^{{\prime \quad k},c} = \frac{\sum\limits_{r}^{\quad}{\sum\limits_{t_{0}}^{\quad}{\sum\limits_{t_{1}}^{\quad}{{\psi_{t_{0},t_{1}}^{(r)}\left( {k,c,j,m} \right)}\left( {O_{t_{0},s,{t_{1}c}}^{r} - \mu_{j,m}^{{\prime \quad k},c}} \right)\left( {O_{t_{0},t_{1},c}^{(r)} - \mu_{j,m}^{{\prime \quad k},c}} \right)^{t}}}}}{\sum\limits_{r}^{\quad}{\sum\limits_{t_{0}}^{\quad}{\sum\limits_{t_{1}}^{\quad}{\psi_{t_{0},t_{1}}^{(r)}\left( {k,c,j,m} \right)}}}}} \\{\omega_{j,m}^{{\prime \quad k},c} = \frac{\sum\limits_{r}^{\quad}{\sum\limits_{t_{0}}^{\quad}{\sum\limits_{t_{1}}^{\quad}{\psi_{t_{0},t_{1}}^{(r)}\left( {k,c,j,m} \right)}}}}{\sum\limits_{r}^{\quad}{\sum\limits_{t_{0}}^{\quad}{\sum\limits_{t_{1}}^{\quad}{\sum\limits_{m = 1}^{M}{\psi_{t_{0},t_{1}}^{(r)}\left( {k,c,j,m} \right)}}}}}}\end{matrix}$

[0052] where ψ_(t₀, t₁)^((r))(k, c, j, m)

[0053] may equal to one if the observation O_(t₀, t₁, c)^((r))

[0054] is assigned to super state k, state j in channel c and mixturecomponent m, and zero otherwise.

[0055] The update of parameters based on a training sample may becarried out iteratively This may be necessary because the Viterbialgorithm may yield different optimal segmentation during each iterationbefore convergence. Between two consecutive iterations, if thedifference of observation likelihood computed with the Viterbi algorithmis smaller than a specified threshold, the iteration may be terminated.The HMM-CHMM corresponds to a complexity of quadratic with respect tothe number of states in the model. In addition, HMM-CHMM may beefficiently implemented in a parallel fashion.

[0056] An alternative logical structure that includes an embeddedCHMM-HMM (in contrast to an HMM-CHMM) is generically illustrated by FIG.4. As seen in that Figure, a logical structure 60 of an embedded hiddenMarkov model-coupled hidden Markov model. As seen in FIG. 4, theCHMM-HMM is a hierarchical statistical model that includes a CHMM parentlayer 62 (collectively formed from nodes 63) and a HMM child layer 64(collectively formed from nodes 65). The child layer 64 associates oneHMM node 65 to each node 63 in the parent layer 62, and the parametersof the individual HHMs remain independent from each other. Instead, theparameters of each child layer HMM depend upon the state of theconnected parent node 63. Typically, for multidimensional data sets, theCHMM in the parent layer 62 is associated with at least one dimension,and the HMM child layers are associated with data in an orthogonaldimension with respect to the parent layer. With appropriate changes,training of the CHMM-HMM can proceed in a manner similar to thatdiscussed in reference to training of HMM-CHMM image recognitionsystems.

[0057] In one embodiment of face image parameterization and observationblocks extraction illustrated with respect to FIG. 5, a facial image 80(represented as a cartoon face in the Figure) is the image analysistarget. Observation are formed from 8×8 image blocks extracted byscanning the image from left-to-right and top-to-bottom. Adjacent imageblocks 82 overlap in the horizontal and vertical directions, and six DCTcoefficients (a3×2 low-frequency array) are be employed to create theoverlap. The resulting array of observation vectors correspond to sizeof T₀×T_(1, where T) ₀ and T₁ are the number of observation vectorsextracted along the height (H) and the width (W) of the image,respectively. T₀ and T₁ may be computed as earlier described withreference to observation vector calculation.

[0058] Training proceeds by creating a face model formed by defining two(2) channels in the CHMM and five (5) super states in the HMM supportingeach of the CHMM channels. The number of super states in the HMM and ineach CHMM is set to three (3), and all covariance matrices are diagonal.Images used in training correspond to multiple instances of the sameperson. Given a new person not previously found in the database, and notused in previous training of the model, the observation vectors aredetermined, and the Viterbi algorithm is applied to the observationsequence given the parameters of each of the embedded HMM-CHMM. Thehighest matching score between the observation sequence and the trainedface models identify the test image. Using a standard facial imagedatabase, training on 10 images per person, and using five differentface images for testing, recognition rates greater than 80% have beenachieved.

[0059] Since non-intrusive video or picture based security systems arefrequently able to provide distinguishable pictures of faces fromvarious angles, the ability to provide high probability personalidentification from face imagery is valuable for security and trackingpurposes. The method of the present invention can be decomposed forefficient implementation in parallel architecture systems, and since ithas a complexity that varies quadratically (rather than exponentiallywith the number of states of the model, large state models can bemaintained.

[0060] As will be understood, reference in this specification to “anembodiment,” “one embodiment,” “some embodiments,” or “otherembodiments” means that a particular feature, structure, orcharacteristic described in connection with the embodiments is includedin at least some embodiments, but not necessarily all embodiments, ofthe invention. The various appearances “an embodiment,” “oneembodiment,” or “some embodiments” are not necessarily all referring tothe same embodiments.

[0061] If the specification states a component, feature, structure, orcharacteristic “may”, “might”, or “could” be included, that particularcomponent, feature, structure, or characteristic is not required to beincluded. If the specification or claim refers to “a” or “an” element,that does not mean there is only one of the element. If thespecification or claims refer to “an additional” element, that does notpreclude there being more than one of the additional element.

[0062] Those skilled in the art having the benefit of this disclosurewill appreciate that many other variations from the foregoingdescription and drawings may be made within the scope of the presentinvention. Accordingly, it is the following claims, including anyamendments thereto, that define the scope of the invention.

The claimed invention is:
 1. An image processing method, comprising:forming from multiple images a hierarchical statistical model for eachobject to be identified in an image training database, the hierarchicalstatistical model having a parent layer with multiple supemodesassociated with a first image direction and a child layer havingmultiple nodes associated with each supernode of the parent layer and asecond image direction, obtaining an array of observation vectors froman image to be identified, applying a Viterbi algorithm to theobservation vectors given the parameters of the hierarchical statisticalmodel for each object, and identifying an object by finding a highestmatching score between an observation sequence and hierarchicalstatistical model.
 2. The method according to claim 1, wherein theobjects are faces.
 3. The method according to claim 1, wherein theparent layer is formed from a hidden Markov models and the child layeris formed from coupled hidden Markov models.
 4. The method according toclaim 1, wherein the parent layer is formed from a coupled hidden Markovmodels and the child layer is formed from hidden Markov models.
 5. Anface recognition method, comprising: obtaining an array of observationvectors from a facial image to be identified, applying a Viterbialgorithm to the observation vectors given the parameters of ahierarchical statistical model for each object, and identifying a faceby finding a highest matching score between an observation sequence anda hierarchical statistical model.
 6. The method according to claim 5,wherein the hierarchical statistical model has a parent layer formedfrom a hidden Markov model (HMM) and a child layer is formed from acoupled hidden Markov model (CHMM).
 7. The method according to claim 5,wherein the hierarchical statistical model has a parent layer formedfrom a coupled hidden Markov model (CHMM) and a child layer is formedfrom a hidden Markov model (HMM).
 8. An article comprising a storagemedium having stored thereon instructions that when executed by amachine result in: forming from multiple images a hierarchicalstatistical model for each object to be identified in an image trainingdatabase, the hierarchical statistical model having a parent layer withmultiple supernodes associated with a first image direction and a childlayer having multiple nodes associated with each supemode of the parentlayer and a second image direction, obtaining an array of observationvectors from an image to be identified, applying a Viterbi algorithm tothe observation vectors given the parameters of the hierarchicalstatistical model for each object, and identifying an object by findinga highest matching score between an observation sequence andhierarchical statistical model.
 9. The article comprising a storagemedium having stored thereon instructions of claim 8, wherein theobjects are faces.
 10. The article comprising a storage medium havingstored thereon instructions of claim 8, wherein the parent layer isformed from a hidden Markov models and the child layer is formed fromcoupled hidden Markov models.
 11. The article comprising a storagemedium having stored thereon instructions of claim 8, wherein the parentlayer is formed from a coupled hidden Markov models and the child layeris formed from hidden Markov models.
 12. An article comprising a storagemedium having stored thereon instructions that when executed by amachine result in: obtaining an array of observation vectors from afacial image to be identified, applying a Viterbi algorithm to theobservation vectors given the parameters of a hierarchical statisticalmodel for each object, and identifying a face by finding a highestmatching score between an observation sequence and a hierarchicalstatistical model.
 13. The article comprising a storage medium havingstored thereon instructions of claim 12, wherein the hierarchicalstatistical model has a parent layer formed from a hidden Markov model(HMM) and a child layer is formed from a coupled hidden Markov model(CHMM).
 14. The article comprising a storage medium having storedthereon instructions of claim 12, wherein the hierarchical statisticalmodel has a parent layer formed from a coupled hidden Markov model(CHMM) and a child layer is formed from a hidden Markov model (HMM). 15.An image processing system comprising: an image training database havinga hierarchical statistical model for each object to be identified, thehierarchical statistical model having a parent layer with multiplesupernodes associated with a first image direction and a child layerhaving multiple nodes associated with each supernode of the parent layerand a second image direction, a classification module that obtains anarray of observation vectors from an image to be identified and tests itfor identity against the image training database by applying a Viterbialgorithm to the observation vectors given the parameters of thehierarchical statistical model for each object, and identifying anobject by finding a highest matching score between an observationsequence and hierarchical statistical model in the image trainingdatabase.
 16. The image processing system according to claim 15, whereinthe objects are faces.
 17. The image processing system according toclaim 15, wherein the parent layer is formed from a hidden Markov modelsand the child layer is formed from coupled hidden Markov models.
 18. Theimage processing system according to claim 15, wherein the parent layeris formed from a coupled hidden Markov models and the child layer isformed from hidden Markov models.
 19. An face recognition system,comprising: a database of observation vectors from facial images to beidentified, a classification module that applies a Viterbi algorithm tothe observation vectors in the database given the parameters of ahierarchical statistical model for each object, and identifies a face byfinding a highest matching score between an observation sequence and ahierarchical statistical model.
 20. The face recognition systemaccording to claim 19, wherein the hierarchical statistical model has aparent layer formed from a hidden Markov model (HMM) and a child layeris formed from a coupled hidden Markov model (CHMM).
 21. The facerecognition system according to claim 19, wherein the hierarchicalstatistical model has a parent layer formed from a coupled hidden Markovmodel (CHMM) and a child layer is formed from a hidden Markov model(HMM).